- Anne Owen, Nik Lomax, LIDA
- Janet Boutell, Jim Lewsey, University of Glasgow
- Gerry McCartney, Jane Parkinson, NHS Health Scotland
26 June 2018
Source: https://www.slideshare.net/jnicotra/gestalt-theory-for-visual-design
Data visualisations
Mapping rules:
etc
etc
This set of mapping rules…
| Variable in Dataset | Graphical Feature |
|---|---|
| % of population providing free care | Position along vertical axis |
| % of population with health needs | Position along horizontal axis |
| Size of population in areal unit | Size of circle |
| Whether in North or south of England | Colour of bubble |
Applied to 2001 English/Welsh Census data…
…Produces the following
Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1326106/
Population data are data where:
From the Human Mortality Database
FALSE # A tibble: 1,445,886 x 6 FALSE country year age sex death_count population_count FALSE <chr> <int> <int> <chr> <dbl> <dbl> FALSE 1 AUS 1921 0 female 3842. 62758. FALSE 2 AUS 1921 1 female 586. 57766. FALSE 3 AUS 1921 2 female 390. 57014. FALSE 4 AUS 1921 3 female 254. 58307. FALSE 5 AUS 1921 4 female 176. 58711. FALSE 6 AUS 1921 5 female 146. 59875. FALSE 7 AUS 1921 6 female 128. 61023. FALSE 8 AUS 1921 7 female 112. 59465. FALSE 9 AUS 1921 8 female 97.0 57746. FALSE 10 AUS 1921 9 female 83.8 56186. FALSE # ... with 1,445,876 more rows
1.4 million rows!
## ## Call: ## lm(formula = log(death_rate) ~ sex, data = .) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.5977 -1.1075 -0.1401 1.2590 3.2585 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -3.60459 0.02012 -179.148 < 2e-16 *** ## sexmale 0.22777 0.02845 8.005 1.35e-15 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.326 on 8682 degrees of freedom ## Multiple R-squared: 0.007326, Adjusted R-squared: 0.007212 ## F-statistic: 64.07 on 1 and 8682 DF, p-value: 1.353e-15
## ## Call: ## lm(formula = log(death_rate) ~ sex + years_since_first, data = .) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.62526 -0.38316 0.01859 0.40022 1.70713 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.9755770 0.0246242 39.62 <2e-16 *** ## sexmale 0.2277717 0.0120983 18.83 <2e-16 *** ## years_since_first -0.0234263 0.0001181 -198.36 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.5637 on 8681 degrees of freedom ## Multiple R-squared: 0.8206, Adjusted R-squared: 0.8205 ## F-statistic: 1.985e+04 on 2 and 8681 DF, p-value: < 2.2e-16
## ## Call: ## lm(formula = log(death_rate) ~ sex + poly(years_since_first, ## 2), data = .) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.63102 -0.25707 -0.03739 0.22951 1.37089 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -3.605e+00 5.590e-03 -644.88 <2e-16 *** ## sexmale 2.278e-01 7.905e-03 28.81 <2e-16 *** ## poly(years_since_first, 2)1 -1.118e+02 3.683e-01 -303.59 <2e-16 *** ## poly(years_since_first, 2)2 -3.976e+01 3.683e-01 -107.96 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.3683 on 8680 degrees of freedom ## Multiple R-squared: 0.9234, Adjusted R-squared: 0.9234 ## F-statistic: 3.488e+04 on 3 and 8680 DF, p-value: < 2.2e-16
## ## Call: ## lm(formula = log(death_rate) ~ sex + poly(years_since_first, ## 3), data = .) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.53729 -0.24378 -0.03294 0.22219 1.31336 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -3.605e+00 5.488e-03 -656.77 <2e-16 *** ## sexmale 2.278e-01 7.762e-03 29.35 <2e-16 *** ## poly(years_since_first, 3)1 -1.118e+02 3.616e-01 -309.18 <2e-16 *** ## poly(years_since_first, 3)2 -3.976e+01 3.616e-01 -109.95 <2e-16 *** ## poly(years_since_first, 3)3 -6.509e+00 3.616e-01 -18.00 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.3616 on 8679 degrees of freedom ## Multiple R-squared: 0.9262, Adjusted R-squared: 0.9261 ## F-statistic: 2.722e+04 on 4 and 8679 DF, p-value: < 2.2e-16
## ## Call: ## lm(formula = log(death_rate) ~ sex * poly(years_since_first, ## 2), data = .) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.62939 -0.25635 -0.03894 0.22798 1.37746 ## ## Coefficients: ## Estimate Std. Error t value ## (Intercept) -3.605e+00 5.587e-03 -645.155 ## sexmale 2.278e-01 7.901e-03 28.827 ## poly(years_since_first, 2)1 -1.127e+02 5.207e-01 -216.458 ## poly(years_since_first, 2)2 -3.907e+01 5.207e-01 -75.039 ## sexmale:poly(years_since_first, 2)1 1.769e+00 7.363e-01 2.402 ## sexmale:poly(years_since_first, 2)2 -1.385e+00 7.363e-01 -1.881 ## Pr(>|t|) ## (Intercept) <2e-16 *** ## sexmale <2e-16 *** ## poly(years_since_first, 2)1 <2e-16 *** ## poly(years_since_first, 2)2 <2e-16 *** ## sexmale:poly(years_since_first, 2)1 0.0163 * ## sexmale:poly(years_since_first, 2)2 0.0600 . ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.3682 on 8678 degrees of freedom ## Multiple R-squared: 0.9235, Adjusted R-squared: 0.9234 ## F-statistic: 2.095e+04 on 5 and 8678 DF, p-value: < 2.2e-16
## ## Call: ## lm(formula = log(death_rate) ~ sex + is_scotland + poly(years_since_first, ## 2), data = .) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.63147 -0.25585 -0.03673 0.22849 1.36892 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -3.603e+00 5.641e-03 -638.724 <2e-16 *** ## sexmale 2.278e-01 7.903e-03 28.822 <2e-16 *** ## is_scotlandTRUE -4.947e-02 2.124e-02 -2.329 0.0199 * ## poly(years_since_first, 2)1 -1.119e+02 3.687e-01 -303.387 <2e-16 *** ## poly(years_since_first, 2)2 -3.982e+01 3.690e-01 -107.907 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.3682 on 8679 degrees of freedom ## Multiple R-squared: 0.9235, Adjusted R-squared: 0.9234 ## F-statistic: 2.618e+04 on 4 and 8679 DF, p-value: < 2.2e-16
## Analysis of Variance Table ## ## Model 1: log(death_rate) ~ sex ## Model 2: log(death_rate) ~ sex + years_since_first ## Model 3: log(death_rate) ~ sex + poly(years_since_first, 2) ## Model 4: log(death_rate) ~ sex + poly(years_since_first, 3) ## Model 5: log(death_rate) ~ sex * poly(years_since_first, 2) ## Model 6: log(death_rate) ~ sex + is_scotland + poly(years_since_first, ## 2) ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 8682 15261.4 ## 2 8681 2758.5 1 12502.9 92243.6938 < 2e-16 *** ## 3 8680 1177.5 1 1581.1 11664.6875 < 2e-16 *** ## 4 8679 1135.1 1 42.4 312.5759 < 2e-16 *** ## 5 8678 1176.2 1 -41.1 ## 6 8679 1176.8 -1 -0.5 3.8829 0.04881 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
How? By representing the surface on a map
| Data Variable | Aesthetic |
|---|---|
| Latitude | Horizontal position |
| Longitude | Vertical position |
| Elevation | Colour/shade/contour lines |
| Data Variable | Aesthetic |
|---|---|
| Year | Horizontal position |
| Age | Vertical position |
| Mortality rate | Colour/shade/contour lines |
| Parameter | Drug-related | Suicide | Alcohol |
|---|---|---|---|
| First year of effect | 1988 | 1987 | 1980 |
| First age affected | 15 | 17 | 15 |
| Peak age | 25 | 25 | 9 |
| First cohort affected | 1942 | 1938 | 1961 |
| Peak cohort | 1968 | 1964 | 1997 |
| Fit | Good | Good | Bad |
And finally: